Search CORE

41 research outputs found

Accelerated focused crawling through online relevance feedback

Author: Chakrabarti Soumen
Mallela Subramanyam
Punera Kunal
Publication venue
Publication date: 01/01/2002
Field of study

The organization of HTML into a tag tree structure, which is rendered by browsers as roughly rectangular regions with embedded text and HREF links, greatly helps surfers locate and click on links that best satisfy their information need. Can an automatic program emulate this human behavior and thereby learn to predict the relevance of an unseen HREF target page w.r.t. an information need, based on information limited to the HREF source page? Such a capability would be of great interest in focused crawling and resource discovery, because it can fine-tune the priority of unvisited URLs in the crawl frontier, and reduce the number of irrelevant pages which are fetched and discarded

The structure of broad topics on the web

Author: David M. Pennock
Kunal Punera
Mukul M. Joshi
Soumen Chakrabarti
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2004
Field of study

Crossref

Real-time augmented reality filters expressive of user sentiment

Author: Barlow Joy
Carter Erik
Chang Mark L.
Daniel Jennifer
Goldstein Brett
Harrison-Conwill Giles
Meinhardt Emily
Punera Kunal
Singh Bhavik
Streu Nate
Publication venue: Technical Disclosure Commons
Publication date: 20/03/2019
Field of study

Body language and facial expressions are an important component of human communication. Some messaging applications include features to send emoji, animated GIFs, etc. to express emotion. However, such content does not include the user’s image. This disclosure describes techniques that enable users to choose augmented reality effects that are added to a user’s image and that help users express an emotion

Technical Disclosure Common

Recommended from our members

Enhanced classification through exploitation of hierarchical structures

Author: Punera Kunal Vinod Kumar
Publication venue
Publication date: 01/01/2007
Field of study

textHumans often organize information by encoding it in structures that link together entities such as concepts, objects, properties etc. Among the various structures possible, hierarchies are commonly used. For instance, taxonomies of categories commonly employ hierarchies to indicate that one category “is a” type of another. The Yahoo! Web Directory and the Open Directory Project are two examples of large taxonomies where topics are hierarchically arranged. Hierarchies are also used to recursively decompose composite objects into their constituent parts. Examples of this are webpages that can be parsed and then represented as DOM-trees, where the DOM nodes correspond to sections of the webpages. In this thesis we argue that these hierarchical relationships between entities can be exploited to facilitate common data mining tasks defined upon them, like automated classification. Specifically, we show that the information encoded in these hierarchies can be reduced to constraints on class membership scores that can then be enforced as a post-processing step to enhance the accuracy of classification. We demonstrate our ideas and algorithms on three real-world tasks. First, we tackle the problem of classification into hierarchical taxonomies. We show how different taxonomy structures can be translated into constraints on the outputs of classifiers learned at the nodes of the hierarchy. In addition, we give algorithms to optimally enforce these constraints and show that this results in improved classification accuracy. In cases where the taxonomies are not available, we give an approach to automatically derive hierarchical relationships amongst a flat set of categories. Next, we work on the problem of detecting noisy (templated) parts of webpages. We give algorithms that rate each section of a webpage in terms of how templated it is. Then we show that smoothing the output of these template classifiers over the DOM-tree hierarchy improves the template detection performance of our system. Finally, we investigate the task of segmenting websites into topically cohesive regions. We define a framework and within it a set of measures that characterize good segmentations, and give an efficient algorithm to find the best segmentation within this framework. We formalize the problem of enforcing constraints on the outputs of classifiers as regularized isotonic or unimodal regression on rooted trees; these are generalizations of the classic isotonic regression problem. The nature of the constraints as well as the cost functions is different in each of the applications mentioned above. For all these formulations we give efficient algorithms to optimally smooth the classifier outputs. These novel formulations and algorithms might be of interest independent of the applications in this thesis.Electrical and Computer Engineerin

Texas ScholarWorks

Clump: A scalable and robust framework for structure discovery

Author: C Kunal Punera
Joydeep Ghosh
Joydeep Ghosh
Kunal Punera
Publication venue: IEEE Computer Society
Publication date
Field of study

We introduce a robust and efficient framework called CLUMP (CLustering Using Multiple Prototypes) for unsupervised discovery of structure in data. CLUMP relies on finding multiple prototypes that summarize the data. Clustering the prototypes enables our algorithm to scale up to extremely large and high-dimensional domains such as text data. Other desirable properties include robustness to noise and parameter choices. In this paper, we describe the approach in detail, characterize its performance on a variety of datasets, and compare it to some existing model selection approaches.

CiteSeerX